Convex Group Clustering of Large Geo-referenced Data Sets 1 Convex Group Clustering of Large Geo-referenced Data Sets

نویسنده

  • Vladimir Estivill-Castro
چکیده

Clustering partitions a data set S = fs1;:::;sng < m into groups of nearby points. Distance-based clustering uses op-timisation criteria for deening the quality of the partition. Formulations using representatives (means or medians of groups) have received much more attention than minimisa-tion of the total within group distance (TWGD). However, this non-representative approach has attractive properties while remaining distance-based. While representative approaches produce partitions with non-overlapping clusters, TWGD does not. We investigate the restriction of TWGD to producing convex-hull disjoint groups and show that this problem is NP-complete in the Euclidean case as soon as m 2. Nevertheless we provide eecient algorithms for solving it approximately. 1 Introduction Clustering is a fundamental task in data analysis since it identiies groups in heterogeneous data. Clustering can be seen as a concept formation or class delineation problem. At least the elds of statistics 44, 46], machine intelligence 5, 15, 32] and more recently knowledge discovery and data mining (KDDM) 12, 14, 37, 47] have contributed with algorithms for many clustering approaches. Hierarchical bottom-up approaches form groups by composition or merging items that are close together 10, 29]. However, top-down partition approaches to clustering are also interesting, in particular for spatial data mining 12, 37, 48]. This perspective deenes clustering as partitioning a heterogeneous data set into smaller more homogeneous groups 2, 19, 40]. Clustering typically uses a metric (or distance) to determine the dissimilarity between the items to be clustered. Here we consider the clustering problem in the context of spatial databases, those typically associated with a Geographical Information System (GIS). In spatial settings, the clustering almost invariably makes use of some distance that captures the notion of proximity, as it reeects the essence of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convex group clustering of large geo-referenced data sets

Clustering partitions a data set S = fs1; : : : ; sng < into groups of nearby points. Distance-based clustering methods use optimisation criteria to de ne the quality of a partition. Formulations using representatives (means or medians of groups) have received much more attention than minimisation of the total within group distance (TWGD). However, this non-representative approach has attractiv...

متن کامل

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

Convex structures via convex $L$-subgroups of an $L$-ordered group

In this paper, we first characterize the convex $L$-subgroup of an $L$-ordered group by means of fourkinds of cut sets of an $L$-subset. Then we consider the homomorphic preimages and the product of convex $L$-subgroups.After that, we introduce an $L$-convex structure constructed by convex $L$-subgroups.Furthermore, the notion of the degree to which an $L$-subset of an $L$-ord...

متن کامل

Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers

Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering  in which there is no need to  be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...

متن کامل

Data Mining Techniques for Autonomous Exploration of Large Volumes of Geo-referenced Crime Data

We incorporate two knowledge discovery techniques, clustering and association-rule mining, into a fruitful exploratory tool for the discovery of spatio-temporal patterns. This tool is an autonomous pattern detector to reveal plausible cause-effect associations between layers of point and area data. We present two methods for this exploratory analysis and we detail algorithms to effectively expl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999